Gated Recurrent Context: Softmax-Free Attention for Online Encoder-Decoder Speech Recognition
نویسندگان
چکیده
Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED with global attentions are not capable of online inference, various attention schemes been developed to reduce ASR latency for better user experience. However, a common limitation conventional softmax-based approaches is that they introduce an additional hyperparameter related length window, requiring multiple trials model training tuning hyperparameter. In order deal this problem, we propose novel softmax-free method and its modified formulation attention, which does need any at phase. Through number experiments, demonstrate tradeoff between proposed technique can be controlled by merely adjusting threshold test Furthermore, methods showed competitive terms word-error-rates (WERs).
منابع مشابه
A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition
Deep neural networks have advanced the state-of-the-art in automatic speech recognition, when combined with hidden Markov models (HMMs). Recently there has been interest in using systems based on recurrent neural networks (RNNs) to perform sequence modelling directly, without the requirement of an HMM superstructure. In this paper, we study the RNN encoder-decoder approach for large vocabulary ...
متن کاملA Recurrent Encoder-Decoder Network for Sequential Face Alignment
We propose a novel recurrent encoder-decoder network model for real-time video-based face alignment. Our proposed model predicts 2D facial point maps regularized by a regression loss, while uniquely exploiting recurrent learning at both spatial and temporal dimensions. At the spatial level, we add a feedback loop connection between the combined output response map and the input, in order to ena...
متن کاملRecurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference
The RepEval 2017 Shared Task aims to evaluate natural language understanding models for sentence representation, in which a sentence is represented as a fixedlength vector with neural networks and the quality of the representation is tested with a natural language inference task. This paper describes our system (alpha) that is ranked among the top in the Shared Task, on both the in-domain test ...
متن کاملAttention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks
With the rising number of interconnected devices and sensors, modeling distributed sensor networks is of increasing interest. Recurrent neural networks (RNN) are considered particularly well suited for modeling sensory and streaming data. When predicting future behavior, incorporating information from neighboring sensor stations is often beneficial. We propose a new RNN based architecture for c...
متن کاملA Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis
Current approaches to statistical parametric speech synthesis using Neural Networks generally require input at the same temporal resolution as the output, typically a frame every 5ms, or in some cases at waveform sampling rate. It is therefore necessary to fabricate highly-redundant frame-level (or samplelevel) linguistic features at the input. This paper proposes the use of a hierarchical enco...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2021
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2021.3049344